2.2 Data source and cleaning
2.2.1 Data Cleaning
[🔗Github Link]
Cleaning
Before initiating the Exploratory Data Analysis (EDA) phase, it was imperative to confirm that our dataset was clean and structured appropriately. The steps followed are listed as below:
Filter to required subreddits -
r/Dogecoin
andr/Cryptocurrency
.Get all posts from
r/Dogecoin
.Get only posts from
r/Cryptocurrency
which contain ‘doge’ or ‘dogecoin’.
Remove posts with missing values or ‘[deleted]’.
Convert dates from Unix format to a YYYY-mm-dd-hh format (this is needed for time-specific analyses)
Merging
We merged the submissions and comments datasets, based on the post ID, which is stored as ID in the submissions dataset, and link_id in the comments dataset. The entire data cleaning process is documented in the ‘project_eda_cleaning.ipynb’ notebook. After merging, the characteristics of the dataset are listed below.
Summary
The dataset has 587,972 rows and 19 columns. The majority of posts and comments are from r/dogecoin (487037) and the rest are from r/cryptocurrencies
(100935)
Variable list
The schema and the variable types are listed below.
subreddit: string (nullable = true)
subreddit_id: string (nullable = true)
id: string (nullable = true)
created_utc: long (nullable = true)
author: string (nullable = true)
is_self: boolean (nullable = true)
num_comments: long (nullable = true)
score: long (nullable = true)
selftext: string (nullable = true)
title: string (nullable = true)
com_subreddit: string (nullable = true)
com_subreddit_id: string (nullable = true)
com_id: string (nullable = true)
com_created_utc: long (nullable = true)
com_author: string (nullable = true)
com_link_id: string (nullable = true)
com_score: long (nullable = true)
com_body: string (nullable = true)
com_submis_id: string (nullable = true)
Generate New Variables:
We created multiple new variables to use in the analysis, as described below.
Buy signals (buy_sig): If either the post or any of its comments contains any of these keywords:
buy
|bought
|moon
|hold
|call
|bull
|like
|yolo
Contains ‘doge| dogecoin’: If a post/comment mentions the word ‘doge’
Post activity per minute (hour): The average number of comments made on a post per minuter (hour). Divide the total number of comments by the duration between the timestamp when the post was created and the timestamp of the last comment on the post.
Day, month and hour: As described above Convert utc_time to
yyyy-mm-dd-hh
Percentage of post of
r/dogecoin
(pct_post_rdoge): Proportion of post in different subreddits
2.2.2 Price Query
[🔗Github Link]
Utilizing the CryptoCompare API, this research project has meticulously collected hourly price data for both Bitcoin and Dogecoin throughout 2023, culminating in a dataset of 8,762 entries. The data retrieval process was divided into six sessions to adhere to the API’s limitation of 2,000 requests per session. Given the substantial disparity in the absolute values of Bitcoin and Dogecoin prices, the visualization employs dual y-axes to facilitate a clearer comparative analysis. This methodological choice allows for an insightful examination of the respective price trajectories within the same graphical representation.
Dogecoin vs. Bitcoin Price
The analysis reveals that both Bitcoin and Dogecoin experienced a decline in value in 2023, coinciding with the broader transition from a bullish to a bearish market within the cryptocurrency domain. Notably, Dogecoin exhibited greater volatility compared to Bitcoin. This heightened fluctuation can be attributed to Dogecoin’s valuation being significantly influenced by community sentiment rather than intrinsic economic factors. A particularly intriguing observation was Dogecoin’s price surge during the FTX crisis, suggesting potential responsiveness to specific market events.
Dogecoin vs. Bitcoin Growth Rate
To quantify the observed trends, we computed the growth rate based on periodic differences. This calculation reinforces the preliminary findings, highlighting Dogecoin’s pronounced susceptibility to fluctuations in response to market events, such as regulatory changes or major scandals. The comparative analysis underscores the distinct behavioral patterns of Bitcoin and Dogecoin within the same market conditions, offering valuable insights into the dynamics of cryptocurrency markets. This research contributes to the academic discourse by elucidating the factors driving volatility in digital currencies, with a particular focus on the influence of community engagement and external events on market behavior.